# SoC Design Laboratory

# Lab3 AXI4\_Lite & AXI4\_Stream interface design

111061560 電機碩二 吳俊鋌

### 一、Block Diagram:



### 二、Describe operation:

#### FSM:

這次 lab 的計算方式為參考 lab\_2 C code 的 BRAM shift data 的方式設計,共分為 6 個 state: TAP、READ、R\_BRAM、W\_BRAM、ADD、OUTPUT,TAP state 為初始化 coefficient 使用,當 TAP 接收完畢,則 ap\_start 升起,進入 READ state 接收 data 值,在進入下一個 state,接下來則為 R\_BRAM、W\_BRAM、ADD 三個 state 的循環,從 R\_BRAM 讀取第(n-1)筆資料,在 W\_BRAM state 寫入第(n)筆,在將讀取出的資料在 ADD state 進行相乘,不斷重複直到完成所有卷積計算後,進入 OUTPUT state,sm\_tvalid 升起,等待輸出。

#### TAP AXI4 lite:

Tap 傳遞採用 Axi4\_lite protocol,當收到 awvalid 為 1 時,awready 會在下一個 clock 升為 1,當 awvalid 與 awready 皆為 1 時,接收 awaddr 存入 BRAM(tap\_A),同時將 wready 設為 1 並關閉 awready,等待 wvalid 升起,當 wvalid 與 wready 同時為 1,則接收 wdata 存入 BRAM(tap\_Di),並關閉 wready。

# DATA AXI4\_Stream:

Data 傳遞採用 AXI4\_stream protocol,當收到 ss\_tvalid 訊號,將 ss\_tready 設為 1,當 ss\_tvalid 與 ss\_tready 皆為 1 時,接收 ss\_tdata 存入 data\_temp,同時將 ss\_tready 關閉,計算完畢時則將 sm\_tvalid 升起,若 sm\_tvalid 與 sm\_tready 皆為 1,輸出 sm\_tdata 訊號,並關閉 sm\_tvalid,若

輸出為最後一筆時,則 sm tlast 升起。

### Shift RAM & TAP RAM:

TAP RAM 分成兩個 state 進行,分別為 TAP state 與 R\_BRAM state, TAP state 進行讀寫 tap 的動作,當 awaddr>20 且 awvalid 與 awready 皆為 1 時,tap\_A 存入 awaddr-20 的位置,確保可以存在 BRAM 的第一位,同時寫入 tap\_Di 的值,在 R\_BRAM state 則只進行讀的動作,將要讀出的位置寫入,在 W\_BRAM state 時,在將 tap 讀出,完成 TAP BRAM 讀寫的動作。

Shift RAM 則是分別在 R\_BRAM 寫入 address,下一個 clock 將 data 讀出,與 W\_BRAM 同時寫入 address 與 R\_BRAM state 讀出的 data,達到 RAM shift 的效果。

#### ap control:

當 awaddr 為 0 且 wdata 為 1 時,ap\_start 升起。 ap\_start 升起後,ap\_idle 歸 0,直到 ss\_tlast 輸入,ap\_idle 改為 1。 ap\_done 在最後一筆資料(count+1=data\_length)時,ap\_done 設為 1,並在傳遞出去後,awaddr 為 0 且 rdata 為 2 或 6 時,ap\_done 關為 0,結束整個計算。

### 三、Resource usage:

FF number : 176 LUT number : 315

| Site Type             |   | Used | 1 | Fixed | <br> <br> - | Prohibited |   | Available | ٠. |      |
|-----------------------|---|------|---|-------|-------------|------------|---|-----------|----|------|
| <br>  Slice LUTs*     | i | 315  | i | 0     | i           | 0          | i | 53200     | i  | 0.59 |
| LUT as Logic          | 1 | 315  | I | 0     | I           | 0          | I | 53200     | I  | 0.59 |
| LUT as Memory         | 1 | 0    | I | 0     | I           | 0          | I | 17400     | I  | 0.00 |
| Slice Registers       | 1 | 176  | I | 0     |             | 0          | I | 106400    |    | 0.17 |
| Register as Flip Flop | 1 | 176  |   | 0     |             | 0          | I | 106400    |    | 0.17 |
| Register as Latch     | 1 | 0    |   | 0     |             | 0          | I | 106400    |    | 0.00 |
| F7 Muxes              | 1 | 0    |   | 0     |             | 0          | I | 26600     |    | 0.00 |
| F8 Muxes              | I | 0    | Ţ | 0     | I           | 0          | ļ | 13300     | I  | 0.00 |

### 四、Timing Report:

Max frequency: 90.909 MHz Clock period: 11 ns (0.000 5.500)

| Clock    | Waveform(ns)  | Period(ns) | Frequency(MHz) |
|----------|---------------|------------|----------------|
|          |               |            |                |
| axis_clk | {0.000 5.500} | 11.000     | 90.909         |

```
Max Delay Paths
                                          0.312ns (required time - arrival time) FSM_onehot_state_reg[3]/C
 Slack (MET) :
                                            (rising edge-triggered cell FDCE clocked by axis_clk {rise@0.000ns fall@5.500ns period=11.000ns})
    Destination:
                                          product_reg[31]/D
                                            (rising edge-triggered cell FDCE clocked by axis_clk {rise@0.000ns fall@5.500ns period=11.000ns})
                                          axis_clk
Setup (Max at Slow Process Corner)
    Path Group:
    Path Type:
                                         Jecupy (max at 510W Frucess Corner)
11.000ns (axis_clk rise@1.000ns - axis_clk rise@0.000ns)
10.551ns (logic 7.856ns (74.454%) route 2.695ns (25.546%))
9 (CARRY4=4 DSP46E1=2 LUT2=2 LUT3=1)
-0.145ns (DCD - SCD + CPR)
ay (DCD): 2.128ns = (13.128 - 11.000)
    Requirement:
Data Path Delay:
   Logic Levels: 9 (CARRY
Clock Path Skew: -0.145ns
Destination Clock Delay (DCD):
       Source Clock Delay (SCD):
Clock Pessimism Removal (CPR):
                                                            2.456ns
0.184ns
                                          0.035ns ((TSJ<sup>2</sup> + TIJ<sup>2</sup>)<sup>1/2</sup> + DJ) / 2 + PE
(TSJ): 0.071ns
    Clock Uncertainty:
       Total System Jitter
Total Input Jitter
Discrete Jitter
                                             (TIJ):
                                                             0.000ns
                                                             0.000ns
                                                           0.000ns
       Phase Error
                                               (PE):
```

| Location | Delay type              | Incr(ns)     | Path(ns) | Netlist Resource(s)                     |
|----------|-------------------------|--------------|----------|-----------------------------------------|
|          | (clock axis_clk rise ed | ge)          |          |                                         |
|          |                         | 0.000        | 0.000 r  |                                         |
|          |                         | 0.000        | 0.000 r  | axis_clk (IN)                           |
|          | net (fo=0)              | 0.000        | 0.000    | axis_clk                                |
|          |                         |              | r        | axis_clk_IBUF_inst/I                    |
|          | IBUF (Prop ibuf I O)    | 0.972        | 0.972 r  | axis clk IBUF inst/O                    |
|          | net (fo=1, unplaced)    | 0.800        | 1.771    | axis clk IBUF                           |
|          |                         |              | r        | axis clk IBUF BUFG inst/I               |
|          | BUFG (Prop bufg I O)    | 0.101        | 1.872 r  | axis clk IBUF BUFG inst/O               |
|          | net (fo=176, unplaced)  | 0.584        | 2.456    | axis clk IBUF BUFG                      |
|          | FDCE                    |              |          | FSM_onehot_state_reg[3]/C               |
|          | FDCE (Prop fdce C Q)    | 0.478        | 2.934 r  | FSM onehot state reg[3]/Q               |
|          | net (fo=113, unplaced)  | 0.414        |          | FSM onehot state reg n 0 [3]            |
|          | ,,                      |              |          | product0 0 i 1/I1                       |
|          | LUT2 (Prop lut2 I1 0)   | 0.295        |          | product0 0 i 1/0                        |
|          | net (fo=1, unplaced)    |              |          | read bramdata[16]                       |
|          | neo (10-1, unpraceu)    | 0.000        |          | product0 0/A[16]                        |
|          | DSP48E1 (Prop dsp48e1 A | [16] PCOUT[4 |          | P1000000_0/R[10]                        |
|          | DSF40E1 (FIOP_dsp40E1_A | 4.036        |          | product0 0/PCOUT[47]                    |
|          | net (fo=1, unplaced)    | 0.055        |          | product0 0 n 106                        |
|          | net (10-1, unpraceu)    | 0.000        |          | product0 1/PCIN[47]                     |
|          | DSP48E1 (Prop dsp48el P | CTMIAZI DIOI |          | producto1/PCIN[4/]                      |
|          | DSF40E1 (FIOP_dsp40e1_F | 1.518        |          | product0 1/P[0]                         |
|          | net (fo=2, unplaced)    | 0.800        |          | product0 1 n 105                        |
|          | net (10-2, unpraceu)    | 0.500        |          | product[19] i 5/10                      |
|          | LUT2 (Prop lut2 I0 0)   | 0.124        |          | product[19]_1_8/10<br>product[19] i 5/0 |
|          | net (fo=1, unplaced)    | 0.000        |          |                                         |
|          | net (fo=1, unplaced)    | 0.000        |          | product[19]_i_5_n_0                     |
|          | CARRY4 (Prop carry4 S[1 | 1 001211     |          | product_reg[19]_i_2/S[1]                |
|          | CARRIA (Prop_Carry4_S[I | _            | 11 500 - | product reg[19] i 2/C0[3]               |
|          | (511)                   | 0.009        |          |                                         |
|          | net (fo=1, unplaced)    | 0.009        |          | product_reg[19]_i_2_n_0                 |
|          | G100V4 (D4 GI           | 201011       | r        | product_reg[23]_i_2/CI                  |
|          | CARRY4 (Prop_carry4_CI_ |              | 11 625   | product reg[22] i 2/C0[2]               |
|          | net (feel               |              |          | product_reg[23]_i_2/C0[3]               |
|          | net (fo=1, unplaced)    | 0.000        |          | product_reg[23]_i_2_n_0                 |
|          | G17774 (7 G1            | 201011       | r        | product_reg[27]_i_2/CI                  |
|          | CARRY4 (Prop_carry4_CI_ |              | 11 752 - |                                         |
|          | /511                    | 0.117        |          | product_reg[27]_i_2/C0[3]               |
|          | net (fo=1, unplaced)    | 0.000        |          | product_reg[27]_i_2_n_0                 |
|          | G37774 (7               | 0.001        | r        | product_reg[31]_i_3/CI                  |
|          | CARRY4 (Prop_carry4_CI_ |              | 10.000   |                                         |
|          |                         |              |          | product_reg[31]_i_3/0[3]                |
|          | net (fo=1, unplaced)    | 0.618        |          | product_reg[31]_i_3_n_4                 |
|          |                         |              |          | product[31]_i_2/I2                      |
|          | LUT3 (Prop_lut3_I2_O)   | 0.307        |          | product[31]_i_2/0                       |
|          | net (fo=1, unplaced)    | 0.000        |          | product[31]_i_2_n_0                     |
|          | FDCE                    |              | r        | product_reg[31]/D                       |

11.000 r
11.000 r
11.000 axis\_clk (IN)
11.000 axis\_clk | BUF\_inst/I
11.838 r axis\_clk | BUF\_inst/O
12.598 axis\_clk | BUF\_BUF\_inst/O
12.699 r axis\_clk | BUF\_BUF\_inst/O
13.128 axis\_clk | BUF\_BUF\_inst/O
13.128 axis\_clk | BUF\_BUF\_G
r product\_reg[31]/C
13.311
13.276 (clock axis\_clk rise edge) 11.000 net (fo=0) 0.000 IBUF (Prop\_ibuf\_I\_0)
net (fo=1, unplaced) 0.838 0.760 BUFG (Prop\_bufg\_I\_O) net (fo=176, unplaced) FDCE clock pessimism 0.091 0.439 0.184 clock uncertainty
FDCE (Setup\_fdce\_C\_D) -0.035 0.044 13.320 product\_reg[31] 13.320 -13.008 required time arrival time 0.312

#### 五、Simulation Waveform:

## AXI\_Lite



## AXI\_Stream:



#### Tap Bram:



#### Data Bram:



### FSM:

